Skip to content

[DNM] feat(da): support fiber (not via c-node)#3244

Draft
julienrbrt wants to merge 37 commits intomainfrom
julien/fiber
Draft

[DNM] feat(da): support fiber (not via c-node)#3244
julienrbrt wants to merge 37 commits intomainfrom
julien/fiber

Conversation

@julienrbrt
Copy link
Copy Markdown
Member

@julienrbrt julienrbrt commented Apr 13, 2026

Overview

Support Fiber client (based on https://github.com/celestiaorg/celestia-app/blob/63fbf31cca216fc4e067a9e1b3a3431115c7009b/fibre), but not via celestia node or apex for this PoC
celestiaorg/celestia-node#4892

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94adb153-a3c0-49fe-a05a-a24b17d355b6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch julien/fiber

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 13, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedApr 28, 2026, 3:05 PM

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 13, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 90.65657% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.16%. Comparing base (2865d6d) to head (4485d91).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
block/public.go 0.00% 12 Missing ⚠️
block/internal/da/fibremock/mock.go 90.90% 5 Missing and 5 partials ⚠️
block/internal/da/fiber_client.go 96.74% 5 Missing and 3 partials ⚠️
pkg/sequencers/solo/sequencer.go 61.53% 5 Missing ⚠️
pkg/config/config.go 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3244      +/-   ##
==========================================
+ Coverage   62.33%   63.16%   +0.82%     
==========================================
  Files         122      124       +2     
  Lines       12873    13258     +385     
==========================================
+ Hits         8024     8374     +350     
- Misses       3968     3995      +27     
- Partials      881      889       +8     
Flag Coverage Δ
combined 63.16% <90.65%> (+0.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

julienrbrt and others added 7 commits April 14, 2026 15:12
Adds a fibremock package with:
- DA interface (Upload/Download/Listen) matching the fibre gRPC service
- In-memory MockDA implementation with LRU eviction and configurable retention
- Tests covering all paths

Migrated from celestiaorg/x402-risotto#16 as-is for integration.
@julienrbrt julienrbrt changed the title feat(da): support fiber (not via c-node) [DNM] feat(da): support fiber (not via c-node) Apr 20, 2026
julienrbrt and others added 15 commits April 20, 2026 14:46
Adds tools/celestia-node-fiber, a new Go sub-module that implements the
ev-node fiber.DA interface by delegating Upload, Download and Listen to a
celestia-node api/client.Client.

Upload and Download run locally against a Celestia consensus node (gRPC)
and Fibre Storage Providers (Fibre gRPC) — no bridge-node hop — using
celestia-node's self-sufficient client (celestiaorg/celestia-node#4961).
Listen subscribes to blob.Subscribe on a bridge node and forwards only
share-version-2 blobs, which is how Fibre blobs settle on-chain via
MsgPayForFibre.

The package lives in its own go.mod, parallel to tools/local-fiber, so
ev-node core does not inherit celestia-app / cosmos-sdk replace-directive
soup. A FromModules constructor accepts the Fibre and Blob Module
interfaces directly so callers can inject mocks or share an existing
*api/client.Client.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#3280)

* test(celestia-node-fiber): showcase end-to-end Upload/Listen/Download

Adds tools/celestia-node-fiber/testing/, a single-validator in-process
showcase that boots a fibre-tagged Celestia chain + in-process Fibre
server + celestia-node bridge, registers the validator's FSP via
valaddr (with the dns:/// URI scheme the client's gRPC resolver
expects), funds an escrow account, and drives the full adapter
surface.

TestShowcase proves the round-trip: subscribe via Listen, Upload a
blob, wait for the share-version-2 BlobEvent that lands after the
async MsgPayForFibre commits, assert the BlobID from Listen matches
Upload's return, Download and diff the payload bytes.

The harness is intentionally single-validator — a 2-validator
Docker Compose showcase is planned as a follow-up for exercising real
quorum collection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(celestia-node-fiber): scale showcase to 10 blobs, document DataSize gap

Upload 10 distinct-payload blobs through adapter.Upload, collect
BlobEvents via adapter.Listen until every BlobID is accounted for
(order-insensitive, rejects duplicates), then round-trip each blob
through adapter.Download to diff bytes. Catches routing bugs (wrong
blob returned for a BlobID) and duplicate-event bugs that a
single-blob test can't see.

Scaling the test also exposed a semantic issue: the v2 share carries
only (fibre_blob_version + commitment), so b.DataLen() — what
listen.go's fibreBlobToEvent reports today — is always 36, not the
original payload length ev-node's fibermock conveys. The adapter
can't derive the payload size from the subscription stream alone;
surfacing it correctly needs an x/fibre PaymentPromise lookup
(tracked as a TODO on fibreBlobToEvent). The test therefore asserts
DataSize is non-zero rather than matching len(payload).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…3281)

listen.go previously set BlobEvent.DataSize to b.DataLen(), which for
a share-version-2 Fibre blob is always the fixed share-data layout
(fibre_blob_version + commitment = 36 bytes) — not the original
payload length. That diverges from ev-node's fibermock contract and
misleads any consumer that uses DataSize to allocate buffers or
report progress.

The v2 share genuinely doesn't carry the original size, and x/fibre
v8 has no chain query to derive it from the commitment. The only
accurate path is to Download the blob and measure. Listen now does
exactly that before forwarding each event. The cost is one FSP
round-trip per v2 blob; can be made opt-out later if it hurts
throughput-sensitive use cases.

Tests:
- Showcase restores the strict DataSize == len(payload) assertion
  across all 10 blobs.
- Unit test TestListen_FiltersFibreOnlyAndEmitsEvent now stubs
  fakeFibre.Download to return a deterministic payload and asserts
  DataSize matches its length.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ight subscriptions (#3283)

feat(celestia-node-fiber): Listen takes fromHeight for resume subscriptions

Threads a fromHeight parameter through the Fibre DA Listen path so a
subscriber can rejoin the stream from a past block height without
missing blobs. Consumes the matching celestia-node API change landed
in celestiaorg/celestia-node#4962, which gave Blob.Subscribe a
fromHeight argument backed by a WaitForHeight loop.

Changes:

- block/internal/da/fiber/types.go: DA.Listen signature now takes
  fromHeight uint64. fromHeight == 0 preserves "follow from tip"
  semantics, >0 replays from that block forward.
- block/internal/da/fibremock/mock.go: replay matching blobs with
  height >= fromHeight before attaching the live subscriber.
- block/internal/da/fiber_client.go: outer fiberDAClient.Subscribe
  does not yet expose a starting height (datypes.DA doesn't plumb
  one), so pass 0 and defer resume-from-height wiring to a future
  datypes.DA change.
- tools/celestia-node-fiber/listen.go: propagate fromHeight to
  client.Blob.Subscribe on the celestia-node API.
- tools/celestia-node-fiber/go.mod: bump celestia-node to the merged
  pseudo-version (v0.0.0-20260423143400-194cc74ce99c) carrying #4962.
- tools/celestia-node-fiber/adapter_test.go: fakeBlob.subscribeFn
  gets the new fromHeight arg; TestListen_FiltersFibreOnlyAndEmitsEvent
  asserts that fromHeight=0 is forwarded.
- tools/celestia-node-fiber/testing/showcase_test.go: existing
  TestShowcase passes fromHeight=0. New TestShowcaseResume uploads 3
  blobs, discovers their settlement heights via a live Listen, then
  opens a fresh Listen with fromHeight at the first blob's height and
  verifies every historical blob is replayed with correct Height and
  DataSize.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
walldiss and others added 2 commits April 27, 2026 14:15
…imental (#3289)

Picks up the chained celestia-app bump on celestia-node
feature/fibre-experimental, which carries the x/valaddr host:port
validation fix (celestia-app PR #7183).

Cascading changes required by the bump:

- celestia-app v8 → v9 across adapter.go, adapter_test.go, listen.go,
  testing/network.go, testing/bridge.go (the new celestia-node uses
  v9, so the consumer must too).
- testing/network.go drops the `dns:///` prefix from the in-process
  validator registration. The new x/valaddr ValidateBasic enforces
  host:port form, so `dns:///host:port` registrations are now rejected
  at tx time. gRPC's passthrough resolver dials bare `host:port`
  directly with no behavioural difference.

Verified locally:
  go vet -tags fibre ./...                       — clean
  go test -tags fibre -short -run TestShowcase   — pass

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(fiber-bench): single-sequencer ev-node bench against a remote Fibre network

Adds tools/celestia-node-fiber/cmd/fiber-bench, a self-contained binary that
spins up an ev-node aggregator wired to a Fibre network with the bridge node
bypassed, then pumps load into the in-mem mempool to measure throughput end-to-
end. Built specifically to flush out the ev-node-vs-Fibre regression where the
combined stack hits ~1k tps despite Fibre alone delivering ~1.3 GiB/s.

Stripped to keep the measurement clean:
- solo sequencer (no based / no forced inclusion)
- aggregator-only (no syncer, no P2P)
- in-mem core.Executor with constant state root (no state-machine cost)
- bridge-bypass cnfiber.Adapter (Upload via consensus gRPC + FSPs only)
- direct InjectTx (no HTTP overhead)

Includes:
- keyring management (test backend, test-only convenience for the bench account)
- Fibre escrow deposit/query helpers so the bench is self-contained
- per-Upload latency instrumentation (p50/p99/mean/max) so we can split
  Fibre-side latency from ev-node submitter serialization
- live periodic stats (tps + MB/s for inj/exec/da_settled streams) and a
  baseline summary at end of run

Build with -tags fibre — without it the celestia-app x/fibre messages aren't
registered in the codec and async pay-for-fibre settlement fails with "unable
to resolve type URL /celestia.fibre.v1.MsgPayForFibre".

* feat(common): default MaxBlobSize to Fibre's actual cap (128 MiB - 5 B)

The 5 MB default left ~25x of Fibre's per-blob capacity unused: Fibre's
MaxBlobSize is 1 << 27 bytes (128 MiB) and the protocol's per-blob header
is 5 bytes (1 byte version + 4 byte data size, see celestia-app/v9/fibre/
blob.go::blobHeaderLen and protocol_params.go::MaxBlobSize). Anchoring
ev-node's default to the actual cap lets each block carry the full
~128 MiB of payload, multiplying settlement throughput at the same
per-Upload latency.

Also drops the bench's executor.FilterTxs overhead margin: the cap
already lives at the right level (Fibre's MaxBlobSize), and reserving
extra in the executor would just leave bandwidth on the table again. If
proto/metadata overhead pushes a marshaled block over the cap, that
should be addressed in ev-node's block producer rather than worked
around in test fixtures.

The link-time override is kept for callers that want to constrain the
default further (smaller cap → smaller blocks → lower per-Upload
latency for environments where that matters).

* fix(block/executing): reserve proto/metadata overhead in RetrieveBatch's MaxBytes

The block producer was passing MaxBytes = MaxBlobSize directly to
GetNextBatch, but the marshaled types.Data (txs + Metadata + proto
framing) is larger than the sum of raw tx bytes. The per-tx proto
length-prefix is ~3 bytes, which is small in absolute terms but adds up
to 1.5% overhead at typical 200 B txs and over 1 MB of overhead at
peak block sizes (128 MiB). Without reserving this margin, a fully
packed batch builds a block that exceeds the submitter's MaxBlobSize
check and halts as 'unrecoverable: single item exceeds DA blob size
limit'.

Reserving in the block producer (rather than in FilterTxs) keeps the
executor's view of MaxBytes equal to the raw-tx budget, which is what
FilterTxs is meant to enforce.

* fix(reaper,cache): make seen-tx retention link-time tunable to avoid OOM under load

The seen-tx cache holds a SHA-256 hash for every transaction the reaper
ever drained. With CleanupInterval = 1h and DefaultTxCacheRetention =
24h hardcoded as consts, sustained throughput causes the map to grow
linearly without ever shrinking until the GC pressure or process
memory caps the run. Observed empirically while benchmarking the Fibre
DA path: at ~1.5M tx/s the bench OOM-killed after ~80 s with ~16 GB
RSS, the cache holding ~120 M entries.

Changing both to vars driven by ldflags lets ev-node keep its
production-friendly defaults (memory-cheap dedup over a 24 h window,
swept once an hour) while letting benchmark builds opt into shorter
windows so the cache reaches a steady state. Example for the
fiber-bench tool:

  go build -ldflags "\
    -X github.com/evstack/ev-node/block/internal/cache.defaultTxCacheRetentionStr=30s \
    -X github.com/evstack/ev-node/block/internal/reaping.cleanupIntervalStr=5s"

A real fix probably reaches further (cap entry count, switch to a TTL
cache implementation, or bypass dedup when the caller already
guarantees uniqueness) but these are larger conversations; the
ldflag knob unblocks measurement in the meantime.

* fix(fiber-bench/loader): backoff with sleep when mempool is full

The loader's drop path was runtime.Gosched + immediate retry, which lets
each worker allocate a fresh 200 B tx slice at ~200k iter/s when the
executor's mempool channel is permanently full. With --workers=8 that
is 1.6 M short-lived allocations/s = ~320 MB/s of GC churn, against
nothing useful — the rejected slices never make it into a block.

Sleeping 100 us on a failed InjectTx caps the per-worker drop rate at
~10k/s and makes total allocation pressure scale with --workers as a
proportional backpressure signal rather than a constant maximum-rate
spin. Drops in the live stats line still grow visibly when the mempool
is full, just at a sane rate.

Without this fix the bench OOM-killed under sustained load even with
--max-pending=4 throttling block production: pending blob memory was
bounded but GC could not keep up with the loader's allocation rate
fast enough to prevent runaway heap growth alongside Badger's L0
backlog and ev-node's pending caches.

* fix(common): make defaultMaxBlobSizeStr a string literal so -ldflags -X works

A previous version initialised the variable via strconv.FormatUint(...),
which Go's linker treats as a non-constant expression — so -ldflags -X
silently no-ops the override. Every benchmark that tried to set a
smaller MaxBlobSize at link time was actually running with the 128 MiB
default, masking what we were measuring.

The correct form is a plain string literal in the source. The Fibre
cap is documented in the comment so the magic number stays
self-explanatory; init() still parses and falls back to the literal
value if parsing fails.

* docs: TODO(throughput-cleanup) on the DA-blob-vs-raw-tx-budget conflation

common.DefaultMaxBlobSize is plugged into two semantically different
limits — the raw-tx budget that gates FilterTxs and the marshaled
ceiling that gates submitter retries — and the conflation has been the
root cause of more than one bug while debugging Fibre throughput
(packed blocks marshaling larger than MaxBlobSize, ad-hoc 2%
reservations in RetrieveBatch, etc.). File three TODOs pointing at
each other and at the umbrella note in common/consts.go so the next
person picking this up can do the cleanup atomically rather than
adding more workarounds.

No behavioral change.

* perf(fiber-da): skip flatten allocation on single-item Submit; honor ctx

Three changes in fiber_client.go::Submit, all hot-path correctness/
efficiency wins surfaced while debugging Fibre throughput:

1. Single-item fast path that bypasses flattenBlobs.
   For data blobs, limitBatchBySize already caps each Submit call at
   one item (each block's data already saturates MaxBlobBytes). The
   flatten step was therefore allocating MaxBlobSize bytes and
   memcpy'ing the entire payload solely to prepend the 8-byte
   count/length prefix used by splitBlobs. At 128 MiB blocks that's
   ~128 MB held in two places at once during every Upload. The fast
   path passes data[0] straight through and saves the full copy.

   Wire-format caveat: a retriever (full-node syncer or light client)
   downloading a blob written via this fast path can't decode it —
   splitBlobs always expects the prefix. The right fix is to pair
   this with a per-item Upload model so flatten falls away entirely;
   tracked as a TODO in the source pointing at the concurrent-uploads
   work where that lands naturally.

2. Honor caller's ctx in Upload.
   The previous context.Background() kept Uploads alive past node
   shutdown and was the proximate cause of the "payment promise
   already processed" warnings — a stale Upload would settle on-chain
   after ev-node had already moved on. Threading the caller's ctx
   makes shutdown promptly cancel in-flight Uploads.

3. Correct SubmittedCount on error.
   On a full-Upload failure the result reported len(data)-1 as
   submitted, which both reads weirdly for len==1 (uint64 underflow
   risk in any future arithmetic) and lies to submitToDA's
   prefix-of-success retry advance. Reset to 0 on error.

No behaviour change for the multi-item retrieve path (flatten still
runs when len > 1). Validated via go build / go vet.

* perf(fiber-da): per-item concurrent Uploads on Submit

Fan out one goroutine per item in fiber DA Submit, calling fiber.Upload
concurrently with the caller's ctx. Settlement throughput now scales
linearly with the batch size: previously ev-node's submitter could
only have one Upload in flight per stream (header + data, mutex-locked
in submitter.go), and each Submit further serialized the batch into
one big flatten-encoded blob. With fan-out, a Submit of N items
becomes N concurrent Uploads, and Fibre's ~1.5 s per-Upload latency
amortizes across N.

The result-aggregation honors submitToDA's "prefix of successes"
contract: SubmittedCount = N means items [0..N) succeeded and the
caller will retry [N..end). Reporting interleaved successes would
double-submit blobs and waste escrow; matching prefix semantics keeps
the retry contract intact even when individual Uploads fail
out-of-order.

Pair changes in submitting/da_submitter.go:
- limitBatchBySize gains a maxItems cap (was total-bytes-only). Each
  item is still bounded by maxItemBytes (chain ceiling), but the
  total batch is now bounded by item count, letting multiple
  full-size items flow through one Submit.
- retryPolicy adds MaxItems with a sensible non-fiber default of 1
  (preserves legacy single-item-per-Submit semantics for backends
  that flatten a batch into one blob).
- For the fiber backend, MaxItems is bumped to 16 — covers a 5 min
  run at 1 b/s production with 4–8 pending blocks while leaving
  headroom for memory pressure under MaxBlobSize-sized items.

Wire-format follow-up (see TODO in fiber_client.go::Submit): the
retrieve path in this file still uses splitBlobs which assumes the
old single-prefixed-blob format. Per-item Uploads now produce raw
blobs with their own BlobIDs; retrieve needs an update to read each
BlobID separately. The bench's aggregator-only setup never invokes
retrieve so this is unblocked for measurement but blocks merging to
production until addressed.

* perf(fiber-bench): use in-memory KV store, not disk-backed Badger

Block production calls store.batch.Commit() synchronously inside
ProduceBlock — which means Badger's write throughput is a hard ceiling
on block production rate. At 128 MB blocks × ~1 b/s the on-disk
backend generates ~150 MB/s of value-log writes plus heavy compaction
churn that backed up under load: vlog files filled (~1.2 GB each)
faster than Badger could rotate, and we hit a "file exists" race on
.vlog rotation that wedged the producer entirely.

The bench has no durability requirement — if it crashes we re-run —
so swap to NewTestInMemoryKVStore. ev-node's code path is unchanged
(same Batch / Commit semantics), the data just lives in a map. This
removes Badger from the critical path and lets the bench measure
ev-node's actual pipeline rather than Badger's write-amplification
curve.

Open question for production fiber rollups: since Fibre IS the
storage (a fiber-only node can re-sync any block from the chain),
does ev-node need to persist block data to local Badger at all?
Possibly worth a fiber-only-skip-block-store mode in the executor,
analogous to how the !fiber broadcast paths are gated. Filed
informally; not blocking the throughput investigation.

* fix(fiber-bench): use ds.MapDatastore, not Badger in-memory

Previous in-memory switch used store.NewTestInMemoryKVStore() which is
backed by Badger with WithInMemory(true). That mode still enforces
Badger's default 1 MiB ValueThreshold, so any block larger than 1 MiB
fails to save with:

  Value with size 133506229 exceeded 1048576 limit

Our 128 MiB blocks blow past this on every commit. Symptom in the
logs is a stream of 'failed to save block data' errors while the
submitter continues to upload pending items from cache — so settlement
keeps advancing for already-cached items but new block production
halts.

Swap to ds.MutexWrap(ds.NewMapDatastore()): a pure-Go in-memory map
with no per-value size limit, thread-safe via the standard sync
wrapper. Same Batch / Commit semantics ev-node expects, just a thin
sync.Mutex around a Go map.

The bench has no durability requirement — the Badger reference is
kept aliased above the assignment so the dependency stays imported
in case we want to switch back via flag later.

* hack(store): swap NewDefaultKVStore to in-memory MapDatastore

Block production calls store.batch.Commit() synchronously inside
ProduceBlock, so storage write throughput is a hard ceiling on block
production. With 128 MiB blocks × ~1 b/s the on-disk Badger backend
generates ~150 MB/s of value-log writes plus heavy compaction; under
sustained load we hit a Badger .vlog rotation race ("file exists")
that wedges the producer entirely.

Returning a sync-wrapped MapDatastore from the canonical constructor
(rather than special-casing the bench) puts the change exactly where
ev-node loads its store, makes the diff small and obvious, and lets
the bench drop its private MapDS swap to call NewDefaultKVStore the
same way every other ev-node binary does.

The HACK comment names three real fixes — async commit, fiber-only
skip-persistence, write-optimised backend — so this isn't read as
"revert to Badger before merge". NewDefaultKVStoreOnDisk preserved as
the literal Badger constructor for any caller that explicitly wants
disk-backed state today.

Reverts the bench-side workaround introduced in 7ed0bf1.

* hack(reaper,cache): collapse seen-tx TTL plumbing back to plain consts

Previous fix (ecd7f62) made DefaultTxCacheRetention and CleanupInterval
ldflag-overridable so the bench could shrink them at link time. That
hid the actual change behind 30 lines of init() / parsing scaffolding —
the diff said "add tunable" but the operational story was "the default
is wrong for any meaningful TPS". Replacing the plumbing with two
const edits puts the hack where it belongs, where the value lives.

DefaultTxCacheRetention: 24h -> 30s.
At ~1.5M tx/s sustained the 24h dedup window grows the cache to ~16 GB
in under a minute (each entry is the SHA-256 hex string, ~150 B in map
representation), which OOM-kills the bench before any throughput
signal is visible. The HACK comment flags 24h as itself wrong:
retention-by-wall-time scales poorly with TPS. The proper fix is an
LRU-by-count cache, or expressing the window in DA blocks (mempool
TTL × DA block time), not a fixed duration.

CleanupInterval: 1h -> 5s.
Coupled to the previous 24h retention; an hourly sweep against a 24h
window means entries can outlive expiry by 1h (fine when retention is
days, completely broken at 30s retention where entries would survive
12× past expiry). The HACK comment notes this should derive from
retention rather than be a separate fixed value.

Reverts the link-time tunability scaffolding from ecd7f62. The bench
no longer needs ldflags for these — same hack with the standard
build.

* docs: surface follow-up issues left by the throughput hacks

Three small comment / dead-code edits. None change behaviour; they
make hidden assumptions visible so the next person reading the diff
doesn't trip on them.

block/internal/common/consts.go
  DefaultMaxBlobSize: flag that the new 128 MiB-5 default is correct
  for fiber-enabled deployments but WRONG for the legacy JSON-RPC
  blob client path — bridge / chain reject blobs above their own
  much smaller cap. The right shape is per-backend caps; the global
  default was always going to be a leaky abstraction.

block/internal/da/fiber_client.go
  Remove flattenBlobs (dead code now that Submit fans out per item).
  Keep splitBlobs but document loudly that it can no longer decode
  blobs THIS branch's Submit writes — the per-item Upload path
  produces raw blobs while splitBlobs expects the legacy "count +
  per-item length" framing. Retrieve / Get / Subscribe callers in
  the same file are therefore broken for our writes; the comment
  points at the wire-format follow-up that has to land before any
  node on this branch tries to sync from another.

block/internal/submitting/da_submitter.go
  fiberDefaultBatchItems = 16: flag the magic number as needing a
  config knob (FiberDAConfig.UploadConcurrency was scaffolded for
  exactly this earlier and reverted; wire it through here when the
  concurrent-uploads change graduates from prototype). 16 is a
  pragmatic measurement default, not a considered production value.

* refactor(fiber-bench): delegate node wiring to rollcmd.StartNode

The bench was hand-rolling the same node wiring testapp/evm/grpc
apps already do via pkg/cmd.StartNode — DA client construction, p2p
client setup, node.NewNode call, signal handling, the run loop. Each
of those grew its own way of doing things in the bench, drifted from
the canonical path, and left a maintenance gap if cmd.StartNode ever
gained a new responsibility (which is exactly how the fiberClient
parameter regression on this branch happened — testapp was never
updated to pass it).

Replace the inline wiring with one rollcmd.StartNode call. The bench
now owns only what's genuinely bench-specific:

  - Cosmos keyring open + bridge-bypass cnfiber.Adapter (no
    production equivalent — bypasses bridge node dialing)
  - Block-signing key created in homedir, passphrase written to a
    temp file so StartNode can read it through its standard flag
  - inMemExecutor + solo sequencer (constant state root for
    measurement; testapp's KVExecutor recomputes state by scanning
    every key, O(N) per block)
  - Loader + stats printer goroutines spawned before the blocking
    StartNode call; SIGINT-to-self triggers shutdown when the
    duration timer expires (StartNode's outer select waits on
    signal/err only — not ctx — so this is the contained way to
    drive duration through its existing shutdown path).

Net diff: ~30 LOC fewer, but the meaningful change is that the bench
is no longer carrying its own copy of testapp/evm/grpc's node setup.
The bridge-bypass adapter, instrumented Upload latency proxy, escrow
helpers, and stats printer remain (those don't duplicate canonical
ev-node code; they exist only for measurement and operator UX).

Filing for follow-up: testapp/evm/grpc apps still don't compile on
this branch because cmd.StartNode gained the fiberClient parameter
without updating its callers. The right fix is one of:
  - testapp/cmd/run.go imports tools/celestia-node-fiber and wires
    cnfiber.New (with bridge) when nodeConfig.DA.Fiber.Enabled.
  - Or cmd.StartNode grows a constructor-style overload so callers
    that don't use Fiber can keep their old signature.
Either way, that's a separate piece of work; this commit just
demonstrates the canonical pattern from the bench side.

* fix(apps): unblock testapp/evm/grpc compile by passing nil fiberClient

The fiberClient parameter was added to pkg/cmd.StartNode in commit
87573ae (on this branch's parent julien/fiber) but the three apps
that call it were never updated. Branch HEAD therefore had three
broken compiles — anyone trying to build a testapp / evm / grpc
binary on this branch hit:

  cmd/run.go: not enough arguments in call to cmd.StartNode

Pass nil for the new parameter in each app and document why with a
TODO pointing at tools/celestia-node-fiber. None of the three apps
currently need fiber DA support — they pre-date this branch's fiber
work — and the right way to add it is to construct a
*cnfiber.Adapter from nodeConfig.DA.Fiber and pass it through, the
same pattern fiber-bench's run.go uses (see commit 57fa859). That
work is out of scope for this commit; this is just the "stop the
bleed" change so the branch builds cleanly.

Three identical comment blocks across the three apps so anyone
landing in any one of them sees the same context.

* refactor(fiber-bench): reuse canonical config flags via rollconf.AddFlags

The bench's runFlags struct had grown ~22 cobra flags, ~15 of which
were straight aliases for things rollconf.AddFlags already registers
(--block-time → --evnode.node.block_time, --batching-strategy →
--evnode.da.batching_strategy, --consensus-grpc →
--evnode.da.fiber.consensus_address, etc.). Each alias was its own
maintenance liability — defaults drifted from the canonical defaults,
new ev-node config fields didn't surface here without manual sync, and
operators learned a bench-specific flag dialect that didn't transfer
to testapp/evm/grpc.

Drop the aliases. Run command now calls:

  rollconf.AddGlobalFlags(root, AppName + "/node")  // --home, --evnode.log.*
  rollconf.AddFlags(runCmd)                         // --evnode.node.*, etc.
  rollcmd.ParseConfig(cmd) → rollcmd.SetupLogger(cfg.Log)

…then post-parse forces what the bench requires (Aggregator, Fiber.Enabled,
P2P.ListenAddress, Signer.SignerType, Pprof off, Prometheus on, BridgeAddress
placeholder for FiberDAConfig.Validate) and overrides canonical defaults
that are wrong for a throughput bench (DA block time → 1s, batching →
immediate, scrape interval → 100ms, namespaces → fb-bench-{h,d}). Operator
flags always win — overrides only fire when cobra reports the flag wasn't
Changed.

Bench-local flags that survived: --duration, --workers, --tx-size,
--mempool-size, --stats-interval, --keep-home, --keyring-dir (cosmos
keyring; not the ev-node signer), --signer-passphrase (still writes a
temp file consumed by --evnode.signer.passphrase_file; commit 2 will
replace this with a real init flow).

Default home stays at ~/.fiber-bench/node (passed as
\"fiber-bench/node\" to AddGlobalFlags) so the os.RemoveAll(cfg.RootDir)
on --keep-home=false runs cannot clobber the cosmos keyring at
~/.fiber-bench/keyring. Updated run-bench.sh and README to use the
canonical --evnode.* flag names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(fiber-bench): inline loader backoff, drop yield.go

yield.go was a single-line wrapper around time.Sleep(100us) parked in
its own file with a long explanatory comment. The comment moves up to
the loaderBackoff const in loader.go (the only caller), the file goes
away. No behavioural change.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants